Search CORE

6 research outputs found

Machine Learning Approaches for Improving Prediction Performance of Structure-Activity Relationship Models

Author: Idakwo Gabriel
Publication venue: The Aquila Digital Community
Publication date: 01/08/2020
Field of study

In silico bioactivity prediction studies are designed to complement in vivo and in vitro efforts to assess the activity and properties of small molecules. In silico methods such as Quantitative Structure-Activity/Property Relationship (QSAR) are used to correlate the structure of a molecule to its biological property in drug design and toxicological studies. In this body of work, I started with two in-depth reviews into the application of machine learning based approaches and feature reduction methods to QSAR, and then investigated solutions to three common challenges faced in machine learning based QSAR studies. First, to improve the prediction accuracy of learning from imbalanced data, Synthetic Minority Over-sampling Technique (SMOTE) and Edited Nearest Neighbor (ENN) algorithms combined with bagging as an ensemble strategy was evaluated. The Friedman’s aligned ranks test and the subsequent Bergmann-Hommel post hoc test showed that this method significantly outperformed other conventional methods. SMOTEENN with bagging became less effective when IR exceeded a certain threshold (e.g., \u3e40). The ability to separate the few active compounds from the vast amounts of inactive ones is of great importance in computational toxicology. Deep neural networks (DNN) and random forest (RF), representing deep and shallow learning algorithms, respectively, were chosen to carry out structure-activity relationship-based chemical toxicity prediction. Results suggest that DNN significantly outperformed RF (p \u3c 0.001, ANOVA) by 22-27% for four metrics (precision, recall, F-measure, and AUPRC) and by 11% for another (AUROC). Lastly, current features used for QSAR based machine learning are often very sparse and limited by the logic and mathematical processes used to compute them. Transformer embedding features (TEF) were developed as new continuous vector descriptors/features using the latent space embedding from a multi-head self-attention. The significance of TEF as new descriptors was evaluated by applying them to tasks such as predictive modeling, clustering, and similarity search. An accuracy of 84% on the Ames mutagenicity test indicates that these new features has a correlation to biological activity. Overall, the findings in this study can be applied to improve the performance of machine learning based Quantitative Structure-Activity/Property Relationship (QSAR) efforts for enhanced drug discovery and toxicology assessments

Aquila Digital Community

Deep Learning-Based Structure-Activity Relationship Modeling for Multi-Category Toxicity Classification: A Case Study of 10K Tox21 Chemicals With High-Throughput Cell-Based Androgen Receptor Bioassay Data

Author: Gong Ping
Idakwo Gabriel
Luttrell Joseph, IV
Thangapandian Sundar
Zhang Chaoyang
Zhou Zhaoxian
Publication venue: The Aquila Digital Community
Publication date: 01/08/2019
Field of study

Deep learning (DL) has attracted the attention of computational toxicologists as it offers a potentially greater power for in silico predictive toxicology than existing shallow learning algorithms. However, contradicting reports have been documented. To further explore the advantages of DL over shallow learning, we conducted this case study using two cell-based androgen receptor (AR) activity datasets with 10K chemicals generated from the Tox21 program. A nested double-loop cross-validation approach was adopted along with a stratified sampling strategy for partitioning chemicals of multiple AR activity classes (i.e., agonist, antagonist, inactive, and inconclusive) at the same distribution rates amongst the training, validation and test subsets. Deep neural networks (DNN) and random forest (RF), representing deep and shallow learning algorithms, respectively, were chosen to carry out structure-activity relationship-based chemical toxicity prediction. Results suggest that DNN significantly outperformed RF (p \u3c 0.001, ANOVA) by 22–27% for four metrics (precision, recall, F-measure, and AUPRC) and by 11% for another (AUROC). Further in-depth analyses of chemical scaffolding shed insights on structural alerts for AR agonists/antagonists and inactive/inconclusive compounds, which may aid in future drug discovery and improvement of toxicity prediction modeling

Aquila Digital Community

A Review On Machine Learning Methods For \u3ci\u3eIn Silico\u3c/i\u3e Toxicity Prediction

Author: Chen Minjun
Gong Ping
Hong Huixiao
Idakwo Gabriel
Luttrell Joseph, IV
Zhang Chaoyang
Zhou Zhaoxian
Publication venue: 'Informa UK Limited'
Publication date: 10/01/2019
Field of study

In silico toxicity prediction plays an important role in the regulatory decision making and selection of leads in drug design as in vitro/vivo methods are often limited by ethics, time, budget, and other resources. Many computational methods have been employed in predicting the toxicity profile of chemicals. This review provides a detailed end-to-end overview of the application of machine learning algorithms to Structure-Activity Relationship (SAR)-based predictive toxicology. From raw data to model validation, the importance of data quality is stressed as it greatly affects the predictive power of derived models. Commonly overlooked challenges such as data imbalance, activity cliff, model evaluation, and definition of applicability domain are highlighted, and plausible solutions for alleviating these challenges are discussed

Aquila Digital Community

Target-Specific Toxicity Knowledgebase (TsTKb): A Novel Toolkit for \u3ci\u3eIn Silico\u3c/i\u3e Predictive Technology

Author: Chen Minjun
Gong Ping
Hong Huixiao
Idakwo Gabriel
Li Yan
Thangapandian Sundar
Zhang Chaoyang
Publication venue: 'Informa UK Limited'
Publication date: 14/11/2018
Field of study

As the number of man-made chemicals increases at an unprecedented pace, efforts of quickly screening and accurately evaluating their potential adverse biological effects have been hampered by prohibitively high costs of in vivo/vitro toxicity testing. While it is unrealistic and unnecessary to test every uncharacterized chemical, it remains a major challenge to develop alternative in silico tools with high reliability and precision in toxicity prediction. To address this urgent need, we have developed a novel mode-of-action-guided, molecular modeling-based, and machine learning-enabled modeling approach for in silico chemical toxicity prediction. Here we introduce the core element of this approach, Target-specific Toxicity Knowledgebase (TsTKb), which consists of two main components: Chemical Mode of Action (ChemMoA) database and a suite of prediction model libraries

Aquila Digital Community

Structure–Activity Relationship-Based Chemical Classification of Highly Imbalanced Tox21 Datasets

Author: Gong Ping
Hong Huixiao
Idakwo Gabriel
Li Yan
Luttrell Joseph
Thangapandian Sundar
Wang Nan
Yang Bei
Zhang Chaoyang
Zhou Zhaoxian
Publication venue: The Aquila Digital Community
Publication date: 01/12/2020
Field of study

The specificity of toxicant-target biomolecule interactions lends to the very imbalanced nature of many toxicity datasets, causing poor performance in Structure–Activity Relationship (SAR)-based chemical classification. Undersampling and oversampling are representative techniques for handling such an imbalance challenge. However, removing inactive chemical compound instances from the majority class using an undersampling technique can result in information loss, whereas increasing active toxicant instances in the minority class by interpolation tends to introduce artificial minority instances that often cross into the majority class space, giving rise to class overlapping and a higher false prediction rate. In this study, in order to improve the prediction accuracy of imbalanced learning, we employed SMOTEENN, a combination of Synthetic Minority Over-sampling Technique (SMOTE) and Edited Nearest Neighbor (ENN) algorithms, to oversample the minority class by creating synthetic samples, followed by cleaning the mislabeled instances. We chose the highly imbalanced Tox21 dataset, which consisted of 12 in vitro bioassays for \u3e 10,000 chemicals that were distributed unevenly between binary classes. With Random Forest (RF) as the base classifier and bagging as the ensemble strategy, we applied four hybrid learning methods, i.e., RF without imbalance handling (RF), RF with Random Undersampling (RUS), RF with SMOTE (SMO), and RF with SMOTEENN (SMN). The performance of the four learning methods was compared using nine evaluation metrics, among which F1 score, Matthews correlation coefficient and Brier score provided a more consistent assessment of the overall performance across the 12 datasets. The Friedman’s aligned ranks test and the subsequent Bergmann-Hommel post hoc test showed that SMN significantly outperformed the other three methods. We also found that a strong negative correlation existed between the prediction accuracy and the imbalance ratio (IR), which is defined as the number of inactive compounds divided by the number of active compounds. SMN became less effective when IR exceeded a certain threshold (e.g., \u3e 28). The ability to separate the few active compounds from the vast amounts of inactive ones is of great importance in computational toxicology. This work demonstrates that the performance of SAR-based, imbalanced chemical toxicity classification can be significantly improved through the use of data rebalancing

Aquila Digital Community

Mode-of-Action-Guided, Molecular Modeling-Based Toxicity Prediction: A Novel Approach for \u3ci\u3eIn Silico\u3c/i\u3e Predictive Toxicology

Author: A Cherkasov
A Fiser
A Maertens
A Roncaglioni
A Vedani
A Vedani
Abhishek K. Jain
AJ Hopfinger
AJ Williams
AN Jain
Anna Gaulton
C Da
C Hansch
C Zhang
D Mav
D Wishart
D Xu
DA Case
DB Kitchen
DG Levitt
E Boutet
E Lim
FM McRobb
FS Collins
G Daston
G Eisenbrand
Gabriel Idakwo
GH Lushington
GM Morris
GT Ankley
H Hong
H Hong
H Luo
HM Berman
HW Ng
I Shah
J Ash
J Polanski
J Verma
JJ Irwin
K Wu
KS Sandhu
M Biasini
M Bouhifd
M Brylinski
M Sachana
MG Damale
N Brown
N Hecker
National Research Council
O Trott
O Trott
OECD
P Benkert
P Sanz Leon
P Wexler
R Benigni
R Huang
R Kavlock
R Salomon-Ferrer
R Todeschini
R Todeschini
RA Laskowski
Ramakrishnan Parthasarathi
RD Cramer
RR Tice
RS Judson
S Dutta
S Fowler
S Sakkiah
S Thangapandian
SA Adcock
SA Elmore
SF Altschul
SJ Sturla
SK Burley
T Madej
T Sterling
TB Knudsen
TEH Allen
U Schmidt
W Humphrey
Weihao Tang
WS Stokes
WS Stokes
X-Y Meng
Yan Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/05/2019
Field of study

Computational toxicology is a sub-discipline of toxicology concerned with the development and use of computer-based models and methodology to understand and predict chemical toxicity in a biological system (e.g., cells and organisms). Quantitative structure–activity relationship (QSAR) has been the predominant approach in computational toxicology. However, classical QSAR methodology has often suffered from low prediction accuracy, largely owing to the lack or non-integration of toxicological mechanisms. To address this lingering problem, we have developed a novel in silico toxicology approach that is based on molecular modeling and guided by mode of action (MoA). Our approach is implemented through a target-specific toxicity knowledgebase (TsTKb), consisting of a pre-categorized database of chemical MoA (ChemMoA) and a series of pre-built, category-specific classification and quantification models. ChemMoA serves as the depository of chemicals with known MoAs or molecular initiating events (i.e., known target biomacromolecules) and quantitative information for measured toxicity endpoints (if available). The models allow a user to qualitatively classify an uncharacterized chemical by MoA and quantitatively predict its toxicity potency. This approach is currently under development and will evolve to incorporate physiologically based pharmacokinetic (PBPK) modeling to address absorption, distribution, metabolism and excretion (ADME) processes in a biological system. The fully developed approach is believed to significantly advance in silico -based predictive toxicology and provide a new powerful toolbox for regulators, the chemical industry and the relevant academic communities

Aquila Digital Community

Crossref